# Non-commercial Research Use
DAM 3B
Other
DAM-3B is a 3-billion-parameter vision-language model capable of generating fine-grained local descriptions for user-specified image regions.
Image-to-Text
Safetensors English
D
nvidia
1,417
81
Japanese Instructblip Alpha
Other
A visual-language instruction-following model capable of generating Japanese descriptions for input images with optional text prompts
Image-to-Text
Transformers Japanese

J
stabilityai
141
54
Wav2vec2 Large Robust 12 Ft Emotion Msp Dim
This model is fine-tuned from Wav2Vec2-Large-Robust for speech emotion recognition, predicting values in three dimensions: arousal, dominance, and valence.
Audio Classification
Transformers English

W
audeering
394.51k
109
Featured Recommended AI Models